Yu Cheng Hsu – Question sets

Question 1

You suspect a casino coin is unfair. Let \(p\) be the probability of the coin landing on heads (1).

1.1

Problem: After five trials, you observe the sequence \([1, 1, 0, 0, 0]\). Please derive the Maximum Likelihood Estimate (MLE) for the probability of getting heads.

1.2

This time we incorporate the prior on the coin having head (1) follows Beta distribution (i.e., \(p \sim B(2,8)\)). What will be the maximum a posterori estimation of the probability of getting the head ?

1.3

How does weak and stronger prior (let, say \(p \sim B(20,80)\), noted that the prior belief is stil at 0.2 chance to get head) affect the MAP estimation?

Question 2

For a linear regression model (for the simplicity we do not comnsider the intercept in this case), \(y=x\beta_1+\epsilon\), where \(\epsilon \sim N(0,\sigma^2)\). Implicitly \(y \sim N(X\beta_1,\sigma^2)\)

2.1

Please show the log-likelihood for \(N\) of observations \((y_i,x_i)\) given \(\beta\) is

\[ l(\beta_1,\sigma^2,y,x) = -\frac{N}{2}\ln(2\pi\sigma^2)-\frac{1}{2\sigma^2}\sum_{i=1}^N(y_i-x_i\beta_1)^2 \]

2.2

Following to question 2.1, please show that the MLE estimation for \(\beta_1\) is

\[ \hat{\beta}_1 = \frac{\sum_{i=1}^N x_i y_i}{\sum_{i=1}^N x_i^2} \]

2.3

The null model for linear model an intercept only model, that is \(y \sim N(\beta_0, \sigma^2)\). Please show the log-likelihood for \(N\) of observations under null model is

\[ l(\beta,\sigma^2,y,x) = -\frac{N}{2}\ln(2\pi\sigma^2)-\frac{1}{2\sigma^2}\sum_{i=1}^N(y_i-\beta_0)^2 \]

2.4

Please show that the Likelihood Ratio Test (LRT) of the null hypothesis \(H_0: \beta_1=0\) and alternative hypothesis \(H_1: \beta_1\neq 0\) is

\[ \frac{1}{\sigma^2}(\sum_{i=1}^N(y_i-x_i\beta_1)^2-\sum_{i=1}^N(y_i-\beta_0)^2) \]

2.5

In the machine learning perspective, we usually ask the model to minimize the mean square error (MSE, \((y_i-\hat{y_i})^2\), where \(\hat{y_i}\) is the predicted value, \(x_i\hat{\beta}_1\) in our case). Please describe why minimize MSE is equivalent to find an MLE.

2.6

In typical linear regression, \(R^2\) or adjusted-\(R^2\) is usually a more common choice indicating how good is a model. Please describe what are the advantage of likelihood ratio test over \(R^2\)

Question 3

A scientist is studying the effect of a new drug on the expression level of a specific gene. They have measured the expression levels in two small groups of mice: Control group (\(n=3\)) and a Treatment group (\(m=3\)). The expression level of these mice

Control (C): \(\{10, 12, 14\}\)
Treatment (T): \(\{18, 20, 22\}\)

We want to test if the drug significantly increases gene expression in mean using a Permutation Test.

3.1

What is the null and alternative hypothesis of the test

3.2

What is the test statistics \(\Delta_{obs}\)?

3.3

Please list out at least 3 kind of permutation and its corresponding test-statistics (\(\Delta_{perm}\))

3.4

If after these 25 permutations, only 1 (the original data) results in a difference \(\ge \Delta_{obs}\), what is the p-value?

3.5

What is the assumption of permutation test?

3.6

What is the advantages of permutation test?

Question 4

Suppose we have a set of independent and identically distributed (i.i.d.) observations \(X_1, X_2, \dots, X_n\) from a Normal distribution with known mean \(\mu\) and unknown variance \(\sigma^2\):

\[ X_i \sim N(\mu, \sigma^2) \]

Please show that MLE for the variance \(\sigma^2\) is: \[ \hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^n (X_i - \mu)^2 \]

(Note: For this exercise, assume \(\mu\) is a known constant. If \(\mu\) were unknown, we would replace it with the sample mean \(\bar{X}\).)

Question 5

If you recall from the previous course, a more common approach on contingency table is Pearson’s \(\chi^2\) test. But this question we will do it in a likelihood ratio manner. We are investigating the effect of a new treatment and it’s effect oncuring cancer. We get the following contingency table

	Cured	Not cured	Marginal
Treatment	\(O_{11}\)	\(O_{12}\)	\(O_{11}+O_{12}\)
Not Treatment	\(O_{21}\)	\(O_{22}\)	\(O_{21}+O_{22}\)
Marginal	\(O_{11}+O_{21}\)	\(O_{12}+O_{22}\)	\(N\)

5.1

A contingency table with \(n\) total observations follows a Multinomial Distribution. The likelihood function for observing counts \(O_{ij}\) with probabilities of getting at each cell \(p_{ij}\) is: \[ L = \frac{n!}{\prod O_{ij}!} \prod p_{ij}^{O_{ij}} \]

If the tretment and outcome are not independent, then the MLE estimation of \(p_{ij}\) is \(\frac{O_{ij}}{N}\)

If the tretment and outcome are independent, then we restrict \(p_{ij}\) to \(\frac{\text{Row total}}{N}\frac{\text{Column total}}{N}\), so the expected count (\(E_{ij}\)) is \(p_{ij}\times N = \frac{\text{Row total}\times\text{Column total}}{N}\)

Please show the LRT test-statisitcs

\[ -2 \ln \lambda =-2 \sum O_{ij} \ln\left( \frac{E_{ij}}{O_{ij}} \right) \]

5.2

In the \(2 \times 2\) contingency table please show that Pearson’s \(\chi^2\) test-statisitcs (\(X^2 = \sum \frac{(O_i - E_i)^2}{E_i}\)) is an approximation to LRT at 2-order Tyler expansion

Let \(\delta_i = O_i - E_i\) is deviation of the observed count from the expected count.

Note that \(\sum \delta_i = 0\) (i.e. \(\sum O_i = \sum E_i\)) because the sum of observed counts must equal the sum of expected counts.

Rewrite the formula in 2.1 by \(\frac{O_i}{E_i} = \frac{E_i + \delta_i}{E_i} = 1 + \frac{\delta_i}{E_i}\), then

\[ G = 2 \sum O_i \ln\left(1 + \frac{\delta_i}{E_i}\right) \]

The Taylor series expansion for \(\ln(1+x)\) around \(x=0\) is:

\[ \ln(1+x) \approx x - \frac{x^2}{2} + \frac{x^3}{3} - \dots \]

Applying this to our term \(\ln\left(1 + \frac{\delta_i}{E_i}\right)\), where \(x = \frac{\delta_i}{E_i}\)

\[ \ln\left(1 + \frac{\delta_i}{E_i}\right) \approx \frac{\delta_i}{E_i} - \frac{\delta_i^2}{2E_i^2} \]

Substitute this approximation back into the \(G\) formula:

\[ \begin{aligned} G \approx& 2 \sum O_i \left( \frac{\delta_i}{E_i} - \frac{\delta_i^2}{2E_i^2} \right) \\ = & 2 \sum (E_i + \delta_i) \left( \frac{\delta_i}{E_i} - \frac{\delta_i^2}{2E_i^2} \right) \\ = & 2 \sum \left( \delta_i - \frac{\delta_i^2}{2E_i} + \frac{\delta_i^2}{E_i} - \frac{\delta_i^3}{2E_i^2} \right) \end{aligned} \]

Ignore the higher-order term (\(\delta_i^3\)) as it becomes negligible when \(O_i\) is close to \(E_i\).

Simplify the remaining terms

\[ \begin{aligned} G \approx& 2 \left( \sum \delta_i + \sum \frac{\delta_i^2}{2E_i} \right) \\ =& 2 \left( 0 + \frac{1}{2} \sum \frac{\delta_i^2}{E_i} \right) \\ =& \sum \frac{\delta_i^2}{E_i}\\ =& \sum \frac{(O_i - E_i)^2}{E_i} \end{aligned} \]

5.3

We have discussed Pearson’s \(\chi^2\) test, LRT and Fisher exact test. Please comment about when to use which test.